Dataset Quality control was executed with the DADA2 in-built QC commands. The mini-pipeline involves left trimming the 15bp ends (suggested for IonTorrent data), discarding reads shorter than 50bp, denoising and finally merging any chimeric reads to increase eficiency. FastQ Screen was used to generate mappings of QC filtered fastQ reads to reference contaminant sequences (hg19, mm10).
| Total Input | QC Filtered | Denoised | Chimeras Filtered | #Bacteria | %Bacteria | #Human | %Human | #Mouse | %Mouse | |
|---|---|---|---|---|---|---|---|---|---|---|
| Gut_C3_1_merged | 998735 | 828496 | 814390 | 722564 | 602787 | 83.4% | 7675 | 0.9% | 130728 | 15.8% |
| Gut_C3_4_merged | 4108448 | 3150534 | 2950625 | 2271789 | 1695998 | 74.7% | 945042 | 30.0% | 95385 | 3.0% |
| Liver_C3_1_merged | 449654 | 403739 | 388635 | 335347 | 304382 | 90.8% | 7232 | 1.8% | 39075 | 9.7% |
| Liver_C3_2_merged | 2199911 | 2001177 | 1958592 | 1845465 | 707620 | 38.3% | 62379 | 3.1% | 1189659 | 59.4% |
| Lung_C3_3_merged | 1317409 | 1042628 | 1035637 | 943925 | 941161 | 99.7% | 3667 | 0.4% | 1888 | 0.2% |
| Lung_F3_1_merged | 1838274 | 1606855 | 1593833 | 1379285 | 1294395 | 93.8% | 11365 | 0.7% | 88189 | 5.5% |
Reads mapping distributions
In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The red line shows the scaled proportion of reads that extend to at least that position. The error rates for each possible transition (A→C, A→G, …) are shown. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score.
Taxonomic Analysis was run according to the standard PhyloSeq Bioconductor package pipeline to generate the following five basic visualisations.
Alpha diversity will visualize how many different species could be decected in a microbial ecosystem.
Beta diversity will depict how different is the microbial composition in one environment compared to another based on the Order of each species. Samples/species are separated on two side-by-side panels.
Abundance of top 30 most abundant OTUs accross all samples. At each OTU family’s horizontal position, the abundance values for each OTU are stacked in order from greatest to least, separate by a thin horizontal line. The values are stacked in order as a means of displaying both the sum total value while still representing the individual OTU abundances.
To capture the species diversity as much as possible, 200 random OTUs among the 1000 most abundant ones are shown in this figure. Any species-level annotation available will be displayed next to the relevant point. OTUs are distinguished in terms of abundance, sample, and phylum by size, shape and color of the points respectively.
The network helps identify any underlying structures in the co-occurence of different phyla across all datasets. The graph represents the 200 most abundant OTUs.
Additional plots were generated to give a picture of the datasets enzyme distributions and involved pathways. PiCRUST2 was used to generate the functional annotations for the treemap and add KEGG_IDs for the pathway analysis.
This is the functional annotation of the OTUs. The treemap depicts the actual abundance of enzyme classes, groupped by sample.
The heatmap demonstrates the relative (on a 0-1 scale) abundance of the pathways that our OTUs were found to participate in.